BUG: .describe() doesn't work for EAs #61707 #61760

kernelism · 2025-07-02T11:41:36Z

This PR fixes a bug where Series.describe() fails on certain ExtensionArray dtypes such as pint[kg], due to attempting to cast the result to Float64Dtype. This is because some of the produced statistics are not castable to float, which raises errors like DimensionalityError.

We now avoid forcing a Float64Dtype return dtype when the EA’s scalar values cannot be safely cast. Instead:

If the EA produces outputs with mixed dtypes, the result is returned with dtype=None.

closes BUG: .describe() doesn't work for EAs #61707
Adds a regression test.
pre-commit checks passed
Adds type annotations
Adds a whatsnew entry

Signed-off-by: ianlv <[email protected]>

* DEPR: object inference in to_stata * Whatsnew * Fix broken test * alphabetize

)

…as-dev#61767) Revert "ENH: Allow third-party packages to register IO engines (pandas-dev#61642)" This reverts commit 9dcce63.

)

…s-dev#61770)

…61705) Co-authored-by: Simon Hawkins <[email protected]> Co-authored-by: jbrockmendel <[email protected]>

…) to 2.3 whatsnew notes (pandas-dev#61795) Co-authored-by: Simon Hawkins <[email protected]>

…1771) Co-authored-by: Joris Van den Bossche <[email protected]>

…as-dev#61799)

* CLN: remove and udpate for outdated _item_cache * CLN: remove outdated _item_cache in comment * CLN: rollback unittest unralted to _item_cache

* PERF: avoid object-dtype path in ArrowEA._explode * typo fixup

pandas-dev#61773) * BUG: Decimal(NaN) incorrectly allowed in ArrowEA constructor with timestamp type * GH ref * BUG: ArrowEA constructor with timestamp type * mypy fixup * mypy fixup

…1785) * REF: remove unreachable, stronger typing in parsers.pyx * mypy fixup

* [pre-commit.ci] pre-commit autoupdate updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.12 → v0.12.2](astral-sh/ruff-pre-commit@v0.11.12...v0.12.2) - [github.com/MarcoGorelli/cython-lint: v0.16.6 → v0.16.7](MarcoGorelli/cython-lint@v0.16.6...v0.16.7) - [github.com/pre-commit/mirrors-clang-format: v20.1.5 → v20.1.7](pre-commit/mirrors-clang-format@v20.1.5...v20.1.7) - [github.com/trim21/pre-commit-mirror-meson: v1.8.1 → v1.8.2](trim21/pre-commit-mirror-meson@v1.8.1...v1.8.2) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Rename method * ignore PLW0177 * Noqa test --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Matthew Roeschke <[email protected]>

* Bump numpy * Bump numpy * Bump tzdata * ignore pytables usage, update xfail condition

…_csv (pandas-dev#61650) * feature pandas-dev#49580: support new-style float_format string in to_csv feat(to_csv): support new-style float_format strings using str.format Detect and process new-style format strings (e.g., "{:,.2f}") in the float_format parameter of to_csv. - Check if float_format is a string and matches new-style pattern - Convert it to a callable (e.g., lambda x: float_format.format(x)) - Ensure compatibility with NaN values and mixed data types - Improves formatting output for floats when exporting to CSV Example: df = pd.DataFrame([1234.56789, 9876.54321]) df.to_csv(float_format="{:,.2f}") # now outputs formatted values like 1,234.57 Co-authored-by: Pedro Santos <[email protected]> * update benchmark test * fixed pre commit * fixed offsets.pyx * fixed tests to windows * Update pandas/io/formats/format.py Co-authored-by: Matthew Roeschke <[email protected]> * Update pandas/io/formats/format.py Co-authored-by: Matthew Roeschke <[email protected]> * Update pandas/io/formats/format.py Co-authored-by: Matthew Roeschke <[email protected]> * updated v3.0.0.rst and fixed tm.assert_produces_warning * fixed test_new_style_with_mixed_types_in_column added match to assert_produces_warning * Update doc/source/whatsnew/v3.0.0.rst (removed reference to this PR) Co-authored-by: Simon Hawkins <[email protected]> * fixed pre-commit * removed tm.assert_produces_warning * fixed space * fixed pre-commit --------- Co-authored-by: Pedro Santos <[email protected]> Co-authored-by: Matthew Roeschke <[email protected]> Co-authored-by: Simon Hawkins <[email protected]>

…andas-dev#61727) * TST: update expecteds for using_string_dtype to fix xfails * Update to_dict_of_blocks test to hardcode object dtype * Comment * Split test, update expected, targeted xfails * Update json test * revert commented-out

…pandas-dev#61582)

* DOC: Update link to pytz documentation * Update the pytz link per the suggestion

Fix test

…#61624)

…ment (pandas-dev#61827) * DOC: Correct error message in AbstractMethodError for methodtype argument * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

fix(doc): rm excessive backtick

…e' and 'Docs' properly (pandas-dev#61836) * DOC: Update README.md to proper link to issues related to Docs * DOC: Update README.md to proper link to issues related to 'good first issue'

…andas-dev#61524)

)

…row/fastparquet engine keyword) (pandas-dev#61877)

…across engines (pandas-dev#61878)

…v#61876)

)

…ar error (pandas-dev#61855) Co-authored-by: Khemkaran <[email protected]>

jbrockmendel · 2025-07-18T20:49:31Z

pandas/tests/series/methods/test_describe.py

+    def test_describe_multiple_dtypes(self):
+        """
+        GH61707: describe() doesn't work on EAs which generate
+        statistics with multiple dtypes.


nitpick can this be a comment instead of a docstring

jbrockmendel · 2025-07-18T20:51:31Z

pandas/core/methods/describe.py

@@ -215,6 +216,14 @@ def reorder_columns(ldesc: Sequence[Series]) -> list[Hashable]:
    return names


+def has_multiple_internal_dtypes(d: list[Any]) -> bool:


i think this can be inlined since it is only used once

jbrockmendel · 2025-07-18T20:52:15Z

pandas/core/methods/describe.py

@@ -251,6 +260,10 @@ def describe_numeric_1d(series: Series, percentiles: Sequence[float]) -> Series:
                import pyarrow as pa

                dtype = ArrowDtype(pa.float64())
+        elif has_multiple_internal_dtypes(d):
+            # GH61707: describe() doesn't work on EAs
+            # with multiple internal dtypes, so return object dtype


is the relevant characteristic "multiple internal dtypes" or "entries that cant be cast to Float64"?

latter makes more sense

…as-dev#61894)

kernelism and others added 30 commits July 2, 2025 17:02

Fix describe() for ExtensionArrays with multiple internal dtypes

9b3c6ac

chore: remove redundant words in comment (pandas-dev#61759)

3550556

Signed-off-by: ianlv <[email protected]>

DEPS: bump pyarrow minimum version from 10.0 to 12.0 (pandas-dev#61723)

22f12fc

DEPR: object inference in to_stata (pandas-dev#56536)

b91fa1d

* DEPR: object inference in to_stata * Whatsnew * Fix broken test * alphabetize

ENH: Allow third-party packages to register IO engines (pandas-dev#61642

9dcce63

)

Revert "ENH: Allow third-party packages to register IO engines" (pand…

391107a

…as-dev#61767) Revert "ENH: Allow third-party packages to register IO engines (pandas-dev#61642)" This reverts commit 9dcce63.

BUG: NA.__and__, __or__, __xor__ with np.bool_ objects (pandas-dev#61768

51763f9

)

BUG: Fix unpickling of string dtypes of legacy pandas versions (panda…

e5a1c10

…s-dev#61770)

DOC: add pandas 3.0 migration guide for the string dtype (pandas-dev#…

2b471c8

…61705) Co-authored-by: Simon Hawkins <[email protected]> Co-authored-by: jbrockmendel <[email protected]>

DOC: add section about upcoming pandas 3.0 changes (string dtype, CoW…

0faaf5c

…) to 2.3 whatsnew notes (pandas-dev#61795) Co-authored-by: Simon Hawkins <[email protected]>

BUG[string]: incorrect index downcast in DataFrame.join (pandas-dev#6…

cf1a11c

…1771) Co-authored-by: Joris Van den Bossche <[email protected]>

TST: update expected dtype for sum of decimals with pyarrow 21+ (pand…

ebca3c5

…as-dev#61799)

DOC: Add link to WebGL in pandas ecosystem (pandas-dev#61790)

b9d5732

CLN: remove and udpate for outdated _item_cache (pandas-dev#61789)

be2cb8c

* CLN: remove and udpate for outdated _item_cache * CLN: remove outdated _item_cache in comment * CLN: rollback unittest unralted to _item_cache

DOC: prepare 2.3.1 whatsnew notes for release (pandas-dev#61794)

ff8a607

PERF: avoid object-dtype path in ArrowEA._explode (pandas-dev#61786)

d21ad1a

* PERF: avoid object-dtype path in ArrowEA._explode * typo fixup

TST: option_context bug on Mac GH#58055 (pandas-dev#61779)

16fd208

BUG: Decimal(NaN) incorrectly allowed in ArrowEA constructor with tim… (

b5e441e

pandas-dev#61773) * BUG: Decimal(NaN) incorrectly allowed in ArrowEA constructor with timestamp type * GH ref * BUG: ArrowEA constructor with timestamp type * mypy fixup * mypy fixup

REF: remove unreachable, stronger typing in parsers.pyx (pandas-dev#6…

fea4f5b

…1785) * REF: remove unreachable, stronger typing in parsers.pyx * mypy fixup

DEPS: Bump NumPy and tzdata (pandas-dev#61806)

d1a245c

* Bump numpy * Bump numpy * Bump tzdata * ignore pytables usage, update xfail condition

CI: Remove PyPy references in CI testing (pandas-dev#61814)

f94b430

BUG: Fix Index.equals between object and string (pandas-dev#61541)

b876c67

BUG: Require sample weights to sum to less than 1 when replace = True (…

9da2c8f

…pandas-dev#61582)

DOC: Update link to pytz documentation (pandas-dev#61821)

d785a3d

* DOC: Update link to pytz documentation * Update the pytz link per the suggestion

REF: separate out helpers in libparser (pandas-dev#61832)

337d5fe

TST: Fix test_mask_stringdtype (pandas-dev#61830)

688e2a0

Fix test

TST: enable 2D tests for MaskedArrays, fix+test shift (pandas-dev#61826)

e1328fc

heoh and others added 20 commits July 11, 2025 15:08

BUG: Fix infer_dtype result for float with embedded pd.NA (pandas-dev…

fd7bfaa

…#61624)

DOC: rm excessive backtick (pandas-dev#61839)

da7f2be

fix(doc): rm excessive backtick

DOC: Update README.md to reference issues related to 'good first issu…

4f2aa4d

…e' and 'Docs' properly (pandas-dev#61836) * DOC: Update README.md to proper link to issues related to Docs * DOC: Update README.md to proper link to issues related to 'good first issue'

BUG: Fix pivot_table margins to include NaN groups when dropna=False (p…

a2315af

…andas-dev#61524)

Remove incorrect line in Series init docstring (pandas-dev#61849)

bc6ad14

TST(string dtype): Resolve xfails in test_from_dummies (pandas-dev#60694

1d153bb

)

API: np.isinf on Index return Index[bool] (pandas-dev#61874)

43711d5

DOC: Add Raises section to to_numeric docstring (pandas-dev#61868)

2c89a91

String dtype: turn on by default (pandas-dev#61722)

13bba34

DOC: show Parquet examples with default engine (without explicit pyar…

598b7d1

…row/fastparquet engine keyword) (pandas-dev#61877)

DOC: update Parquet IO user guide on index handling and type support …

88cb152

…across engines (pandas-dev#61878)

ERR: improve exception message from timedelta64-datetime64 (pandas-de…

042ac78

…v#61876)

BUG: Timedelta with invalid keyword (pandas-dev#61883)

3e9237c

API: Index.__cmp__(Series) return NotImplemented (pandas-dev#61884)

d5eab1b

DOC: make doc build run with string dtype enabled (pandas-dev#61864)

90b1c5d

DOC: fix doctests for string dtype changes (top-level) (pandas-dev#61887

6537afe

)

BUG: disallow exotic np.datetime64 unit (pandas-dev#61882)

6fca116

API: IncompatibleFrequency subclass TypeError (pandas-dev#61875)

4b18266

BUG: If both index and axis are passed to DataFrame.drop, raise a cle…

6a6a1ba

…ar error (pandas-dev#61855) Co-authored-by: Khemkaran <[email protected]>

jbrockmendel reviewed Jul 18, 2025

View reviewed changes

jorisvandenbossche and others added 2 commits July 19, 2025 12:34

BUG: fix padding for string categories in CategoricalIndex repr (pand…

8de38e8

…as-dev#61894)

61760: merge with main

9edf890

kernelism requested review from rhshadrach and mroeschke as code owners July 20, 2025 05:45

kernelism closed this Jul 20, 2025

kernelism deleted the describe-EA-multiple-dtypes-fix branch July 20, 2025 05:48

kernelism mentioned this pull request Jul 20, 2025

BUG: .describe() doesn't work for EAs #61707 #61910

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: .describe() doesn't work for EAs #61707 #61760

BUG: .describe() doesn't work for EAs #61707 #61760

Uh oh!

kernelism commented Jul 2, 2025

Uh oh!

jbrockmendel Jul 18, 2025

Uh oh!

jbrockmendel Jul 18, 2025

Uh oh!

jbrockmendel Jul 18, 2025

Uh oh!

kernelism Jul 20, 2025

Uh oh!

Uh oh!

		@@ -215,6 +216,14 @@ def reorder_columns(ldesc: Sequence[Series]) -> list[Hashable]:
		return names


		def has_multiple_internal_dtypes(d: list[Any]) -> bool:

Uh oh!

BUG: .describe() doesn't work for EAs #61707 #61760

BUG: .describe() doesn't work for EAs #61707 #61760

Uh oh!

Conversation

kernelism commented Jul 2, 2025

Uh oh!

jbrockmendel Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

kernelism Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!